MammalNet: A Large-Scale Video Benchmark for Mammal Recognition and Behavior Understanding.
|
CVPR |
2023 |
0 |
Using Language to Extend to Unseen Domains.
|
ICLR |
2023 |
0 |
Reliable Visual Question Answering: Abstain Rather Than Answer Incorrectly.
|
ECCV |
2022 |
2 |
On Guiding Visual Attention with Language Specification.
|
CVPR |
2022 |
2 |
ReCLIP: A Strong Zero-Shot Baseline for Referring Expression Comprehension.
|
ACL |
2022 |
12 |
The Abduction of Sherlock Holmes: A Dataset for Visual Abductive Reasoning.
|
ECCV |
2022 |
5 |
TL;DW? Summarizing Instructional Videos with Task Relevance and Cross-Modal Saliency.
|
ECCV |
2022 |
0 |
DETReg: Unsupervised Pretraining with Region Priors for Object Detection.
|
CVPR |
2022 |
0 |
Object-Region Video Transformers.
|
CVPR |
2022 |
0 |
K-LITE: Learning Transferable Visual Models with External Knowledge.
|
NIPS/NeurIPS |
2022 |
0 |
Bringing Image Scene Structure to Video via Frame-Clip Consistency of Object Tokens.
|
NIPS/NeurIPS |
2022 |
0 |
How Much Can CLIP Benefit Vision-and-Language Tasks?
|
ICLR |
2022 |
0 |
NewsCLIPpings: Automatic Generation of Out-of-Context Multimodal Media.
|
EMNLP |
2021 |
16 |
CLIP-It! Language-Guided Video Summarization.
|
NIPS/NeurIPS |
2021 |
23 |
Compositional Video Synthesis with Action Graphs.
|
ICML |
2021 |
0 |
Identity-Aware Multi-sentence Video Description.
|
ECCV |
2020 |
6 |
Advisable Learning for Self-Driving Vehicles by Internalizing Observation-to-Action Rules.
|
CVPR |
2020 |
17 |
Robust Change Captioning.
|
ICCV |
2019 |
57 |
Language-Conditioned Graph Networks for Relational Reasoning.
|
ICCV |
2019 |
116 |
Are You Looking? Grounding to Multiple Modalities in Vision-and-Language Navigation.
|
ACL |
2019 |
70 |
Adversarial Inference for Multi-Sentence Video Description.
|
CVPR |
2019 |
0 |
Women Also Snowboard: Overcoming Bias in Captioning Models.
|
ECCV |
2018 |
324 |
Multimodal Explanations: Justifying Decisions and Pointing to the Evidence.
|
CVPR |
2018 |
287 |
Speaker-Follower Models for Vision-and-Language Navigation.
|
NIPS/NeurIPS |
2018 |
307 |
Textual Explanations for Self-Driving Vehicles.
|
ECCV |
2018 |
155 |
Video Object Segmentation with Language Referring Expressions.
|
ACCV |
2018 |
67 |
Object Hallucination in Image Captioning.
|
EMNLP |
2018 |
126 |
Fooling Vision and Language Models Despite Localization and Attention Mechanism.
|
CVPR |
2018 |
0 |
Generating Descriptions with Grounded and Co-referenced People.
|
CVPR |
2017 |
54 |
Gradient-free Policy Architecture Search and Adaptation.
|
CoRL |
2017 |
24 |
A Dataset and Exploration of Models for Understanding Video Data through Fill-in-the-Blank Question-Answering.
|
CVPR |
2017 |
0 |
Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding.
|
EMNLP |
2016 |
1218 |
Commonsense in Parts: Mining Part-Whole Relations from the Web and Image Tags.
|
AAAI |
2016 |
30 |
Grounding of Textual Phrases in Images by Reconstruction.
|
ECCV |
2016 |
0 |
A dataset for Movie Description.
|
CVPR |
2015 |
360 |